Speech processing system based upon a representational state transfer (rest) architecture that uses web 2.0 concepts for speech resource interfaces

ABSTRACT

A speech processing system can include a client, a speech for Web 2.0 system, and a speech processing system. The client can access a speech-enabled application using at least one Web 2.0 communication protocol. For example, a standard browser of the client can use a standard protocol to communicate with the speech-enabled application executing on the speech for Web 2.0 system. The speech for Web 2.0 system can access a data store within which user specific speech parameters are included, wherein a user of the client is able to configure the specific speech parameters of the data store. Suitable ones of these speech parameters are utilized whenever the user interacts with the Web 2.0 system. The speech processing system can include one or more speech processing engines. The speech processing system can interact with the speech for Web 2.0 system to handle speech processing tasks associated with the speech-enabled application.

BACKGROUND

1. Field of the Invention

The present invention relates to the field of speech processingtechnologies and, more particularly, to a speech processing system basedupon Representational State Transfer (REST) architecture that uses Web2.0 concepts for speech resource interfaces.

2. Description of the Related Art

In the past, companies having a Web presence thrived by providing asmany people broad access to as much information as possible. Informationflow was unidirectional, from a company to information consumers. Astime has progressed, users have become inundated with too muchinformation from too many sources. Successful Web sites began to provideuser-facing information management and information filtration mechanismsdesigned to aid users in identifying information of interest. Even theseWeb sites were somewhat flawed in a sense that information still flowedin a unidirectional manner. A user was limited to information gatheredand groomed by a particular information provider.

A new type of Web application began to emerge which emphasized userinteractions and two-way information exchange. These new Webapplications operated more as information marketplaces were peopleshared information and not as information depots where users accessed asemi-static reservoir of information. This new Web and set of Webapplications can be referred to as Web 2.0, where Web 2.0 signifies asecond generation of Web based services and applications that emphasizeonline collaboration and information sharing among users. In otherwords, a Web 1.0 application would be one that was effectively read-onlyfrom a user perspective, where a Web 2.0 application would provide read,write, and update access to end-users. Web 2.0 users can fundamentallychange a Web 2.0 application.

Specific examples of Web 2.0 instances include WIKIs, BLOGs, socialnetworking sites, FOLKSONOMIEs, MASHUPs, and the like. All of these Web2.0 instances allow end-users to add content which other users are ableto access. A value of a Web 2.0 Web site is enhanced by the userprovided content and may even be completely dependent upon it.

For example, WIKIPEDIA (e.g., one Web 2.0 application) is a WIKI basedencyclopedia where each end-user is able to view, add, and edit content.No content would exist without end-user contributions. Informationaccuracy results from an end-user population constantly updatingerroneous entries which other users provide. As new innovations emerge,customers update and add WIKIPEDIA entries that describe these newinnovations. Other examples of Web 2.0 applications include MYSPACE.com,YOUTUBE.com, DEL.ICIO.US.com, CRAIGSLIST.com, and the like.

Currently, a schism exists between speech processing technologies andWeb 2.0 applications, meaning that Web 2.0 instances do not generallyincorporate speech processing technologies. One reason for this is thatconventional interfaces to speech resources are too complex for anaverage end-user to utilize. For this reason, speech technologies aretypically only available from Web sites/services that provide aunidirectional flow of information. For example, speech technologies arecommonly used by enterprises to handle routine customer interactions viaa telephone interface, such as providing bank balances and the like.

One problem contributing to the schism is that speech processingtechnologies are currently implemented using a non-uniform interface andthe Web 2.0 is generally based upon a uniform interface. That is, speechprocessing operations are accessed via function calls, methodinvocations, remote procedure calls (RPC), and other messages that areonly understood by a specific server or a small subset of components. Aspecific invocation mechanism and required parameters must be known by aclient and must be integrated into an interface. A non-uniform interfaceis characteristic of RPC based techniques, which includes Simple ObjectAccess Protocol (SOAP), Common Object Request Broker Architecture(COBRA), Distributed Component Object Model (DCOM), JINI, and the like.Without deliberate integration efforts, however, the chances that twosoftware objects designed from an unconstrained architecture are nearnil. At best, an ad hoc collection of software objects having vastlydifferent interface requirements results from the RPC stylearchitecture. The lack of uniform interfaces makes integrating speechprocessing capabilities for each RPC based application a unique endeavorfraught with application specific challenges, which usually requiresignificant speech processing design skills to overcome.

In contrast, a uniform interface exists that includes a few basicprimitive commands (e.g., GET, PUT, POST, DELETE) that act upon targets,which in a Web 2.0 context are generally able to be referenced byUniform Resource Identifiers (URIs). A term used for this type ofarchitecture is Representational State Transfer (REST). REST basedsolutions simplify component implementation, reduce the complexity ofconnector semantics, improve the effectiveness of performance tuning,and increase the scalability of pure server components. The Web (e.g.,hypertext technologies) in general is founded upon REST principles. Web2.0 expands these REST principles to permit end users to add (HTTP PUT),update (HTTP POST), and remove (HTTP DELETE) content. Thus, WIKIs,BLOGs, FOLKSONOMIEs, MASHUPs, and the like are all considered RESTful,since each generally follows REST principles.

What is needed to bridge the gap between speech processing resources andconventional Web 2.0 applications is a new paradigm for interfacing withspeech processing resources, which makes speech processing resourcesmore available to end-users. In this contemplated paradigm, end-userswould optimally be able to cooperatively and dynamically developspeech-enabled solutions, which the end-users would then be able tointegrate into Web 2.0 content. Thus, a more robust Web 2.0 environmentthat incorporates speech processing technologies will be allowed toevolve. This is a stark contrast with a conventional paradigm forinterfacing with speech processing resources, which is decisivelynon-RESTful in nature.

SUMMARY OF THE INVENTION

The present invention discloses a RESTful speech processing system thatuses Web 2.0 concepts for interfacing with server-side speech resources.The RESTful speech processing system can be used to add customizablespeech processing capabilities to Web 2.0 instances, such as WIKIs,BLOGs, social networking sites, FOLKSONOMIEs, MASHUPs, and the like. Theinvention can access speech-enabled applications via introspectiondocuments. Each speech-enabled application can contain a collection ofentries and resources. The entries can include Web 2.0 entries, such asWIKI entries and the resources can include speech resources, such asspeech recognition, speech synthesis, speech identification, and voiceinterpreter resources. Each entry and resource can be further decomposedinto sub-components specified at a lower granularity level. Eachapplication resource/entry can be introspected, customized, replaced,added, re-ordered, and/or removed by end users.

The present invention can be implemented in accordance with numerousaspects consistent with the material presented herein. For example, oneaspect of the present invention can include a speech processing systemthat includes a client, a speech for Web 2.0 system, and a speechprocessing system. The client can access a speech-enabled applicationusing at least one Web 2.0 communication protocol. For example, astandard browser of the client can use a HyperText Transfer Protocol(HTTP) to communicate with the speech-enabled application executing onthe speech for Web 2.0 system. The speech for Web 2.0 system can accessa data store within which user specific speech parameters are included,wherein a user of the client is able to configure the specific speechparameters of the data store. For example, a user can configure whichspeech resources are available (e.g., TTS, ASR, SIV, VoiceXMLinterpreter, and the like), resource characteristics (language, grammar,voice gender, speaking rate, and the like), delivery characteristics(real-time or not, synchronous or not, delivery protocol, deliverycodec, delivery fidelity, and the like), and other such characteristics.Suitable ones of these speech parameters are utilized whenever the userinteracts with the Web 2.0 system. The speech processing system caninclude one or more speech processing engines. The speech processingsystem can interact with the speech for Web 2.0 system to handle speechprocessing tasks associated with the speech-enabled application.

Another aspect of the present invention can include a system for usingWeb 2.0 as an interface to speech engines. The system can include a Web2.0 server and a server-side speech processing system. The Web 2.0server can serve at least one speech-enabled application to at least oneremotely located client. The server-side speech processing system canhandle speech processing operations for the speech-enabled applications.Communications with the server-side speech processing system can occurvia a set of RESTful commands, such as GET, PUT, POST, and DELETE.

Still another aspect of the present invention can include a speech forWeb 2.0 system that includes a Web 2.0 server. The Web 2.0 server canserve at least one speech-enabled application to remotely locatedclients. The speech-enabled application can include an introspectiondocument, a collection of entries, and a collection of resources. Atleast one of the resources can be a speech resource associated with aspeech engine, which adds a speech processing capability to thespeech-enabled application.

It should be noted that various aspects of the invention can beimplemented as a program for controlling computing equipment toimplement the functions described herein, or a program for enablingcomputing equipment to perform processes corresponding to the stepsdisclosed herein. This program may be provided by storing the program ina magnetic disk, an optical disk, a semiconductor memory, or any otherrecording medium. The program can also be provided as a digitallyencoded signal conveyed via a carrier wave. The described program can bea single program or can be implemented as multiple subprograms, each ofwhich interact within a single computing device or interact in adistributed fashion across a network space.

It should also be noted that the methods detailed herein can also bemethods performed at least in part by a service agent and/or a machinemanipulated by a service agent in response to a service request.

BRIEF DESCRIPTION OF THE DRAWINGS

There are shown in the drawings, embodiments which are presentlypreferred, it being understood, however, that the invention is notlimited to the precise arrangements and instrumentalities shown.

FIG. 1 is a schematic diagram of a system that utilizes Web 2.0 conceptsfor speech processing operations in accordance with an embodiment of theinventive arrangements disclosed herein.

FIG. 2 is a schematic diagram of a system for a Web 2.0 for voice systemin accordance with an embodiment of the inventive arrangements disclosedherein.

FIG. 3 is a schematic diagram showing a WIKI server adapted forcommunications with a Web 2.0 for voice system in accordance with anembodiment of the inventive arrangements disclosed herein.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a schematic diagram of a system 100 that utilizes Web 2.0concepts for speech processing operations in accordance with anembodiment of the inventive arrangements disclosed herein. In system100, a user 110 can use an interface 114 of client 112 to communicatewith the speech for Web 2.0 system 120, which can include a Web 2.0server 122 and/or a RESTful server 130. When the client 112 is a basiccomputing device (e.g., a telephone), a middleware server 116 canprovide an interface 118 to system 120. Interface 114 and/or 118 can bea Web or voice browser, which communicates directly with system 120using Web 2.0 conventions. Applications 126, which the client 112accesses, can be voice-enabled applications stored in data store 124. Atype of browser (e.g., interface 114 and/or 118) used to access theapplications 126 can be transparent to the system 120, or can betransparent at least to RESTful server 130 of system 120.

The RESTful server 130 can provide speech processing operations forapplications 126 by interfacing with speech processing system 150.Communications between the Web 2.0 server 122 and the RESTful server 130can be REST based communications, such as those conducted using the ATOMPUBLISHING PROTOCOL (APP). In one embodiment, servers 122 and 130 can befunctionally integrated into a single server of speech for Web 2.0system 120.

The RESTful server 130 can utilize a set of basic commands enabling thecommand engine 132 to conduct speech processing operations. The commandscan be REST commands that include an HTTP GET, an HTTP POST, and HTTPPUT, and an HTTP DELETE command. The RESTful server 130 can also includean introspection/discovery engine 134 and/or a media engine 136 as wellas data store 138.

Data store 138 can include a set of documents 140, such as introspectiondocuments 142, entry collection documents 144, and resource collectiondocuments 146. The documents 140 together can link the RESTful server130 to speech processing engines 156 of speech processing server 150 andcan control behavior of speech processing server 150. The documents 140and resulting behavior of the speech processing server 150 can beconfigured by user 110 in a user-specific manner. That is differentusers 110 can inject their own voice characteristics, markup, behavior,and/or other features, which the speech processing system 150 utilizes.

The Web 2.0 system 120 can be communicatively linked to one or moreenterprise servers 158 having an associated data store 160. Thus, theWeb 2.0 system 120 can be a communication intermediary which providesuser 110 with access to information and services of the enterpriseserver and data store 160.

Web 2.0 system 120 can further be communicatively linked to one or moreadditional RESTful servers 162, each associated with a data store 164,within which a set of documents, approximately equivalent to documents140, are stored. Communications between Web 2.0 system 120 and speechprocessing system 150 or RESTful server 162 can be based on a RESTfulprotocol, such as APP.

It should be appreciated that RESTful servers 130 and 162 are able tooperate in a stateless fashion which permits RESTful server 162 toseamlessly replace functionality of server 130. That is, stateinformation does not have to be transferred when control is transferredfrom one server 130 to another 162. Thus, system 100 provides a highlyscalable solution (i.e., when under a heavy load, server 130 cantransfer load to server 162) and can provide fault tolerance andrecovery capabilities (i.e., when server 130 experiences runtimeproblems, a different operational server 162 can immediately performoperations previously handled by server 130).

Another point about system 100 that should be emphasized is that client112 is able to interact with the speech-enabled application 126 usingWeb 2.0 communication protocols only. No special client-side speechinterface is required. At the same time, the user 110 is able tocustomize/personalize/configure speech processing behavior atlow-levels.

As used herein, Web 2.0 is a concept that refers to a cooperative Web inwhich end-users 110 add value by providing content, as opposed to Websystems that unidirectionally provide information from an informationprovider to an information consumer. In other words, Web 2.0 refers to areadable, writable, and updateable Web. While a myriad of types of Web2.0 instances exist, some currently popular ones include WIKIs, BLOGS,MASHUPs, FOLKSONOMIEs, social networking sites, and the like.

REST refers to a Representational State Transfer architecture. A RESTapproach focuses on utilizing a constrained operation set, such as GET,PUT, POST, and DELETE, to act against a set of structured targets whichcan be URL addressable. A REST architecture is a client/serverarchitecture which is stateless, cacheable, and layered by nature. RESTreplaces a paradigm of do-something with a make-something-so concept.That is, instead of attempting to execute a kind of state transition fora software object, the REST concept changes a state of a software objectto a user designated state. A RESTful object (e.g., RESTful server 130,162) is one which primarily conforms to REST concepts. A RESTfulinterface can be a simple interface that transmits domain-specific datausing an HTTP based protocol without utilizing an additional messaginglayer, such as SOAP, and without reliance of session tracking HTTPcookies.

The client 112 can be any computing device capable of communicating witheither the system 120 or middleware server 116. In one embodiment,client 112 can include a Web browser 114, which operates as an interfacebetween the user 110 and the system 120. In another embodiment, theclient 112 can be a voice communication device that communicates withthe middleware server 116, which can include a voice browser 118. Inthese embodiments, specific instances of the client 112 can include acomputer, a Web station, a media player, a telephone, a smart phone, andthe like

Web 2.0 server 120 can be a server 120 that provides Web content tointerface 114 and/or 118 and which permits a user 110 to provideadditional Web content, which is made available to other users. The Web2.0 server can be a WIKI server, a BLOG server, a social networkingserver, a MASHUP server, a FOLKSONOMY server, and the like. In oneembodiment, the Web 120 can be a RESTful server, in which casefunctionality shown for server 130 can be incorporated within server120. Alternatively, a transformer can be included in Web 2.0 server,which converts content between a server-specific format (e.g., a WIKIformat) and a RESTful format, such as a format adhering to an APP basedprotocol.

RESTful server 130 and 162 can be a server adhering to REST concepts,which links the server 120 to speech processing server 150. In oneembodiment, the RESTful server 130 can be an APP server. RESTfulcommands can be issued by command engine 132, which are received andprocessed by command interpreter 154. A media interface 136 of theRESTful server 130 can control caching, delivery, fidelity, andformatting of delivered media, which includes delivered speech. Deliverycan be in accordance with a streaming protocol, a file based protocol, areal-time protocol, and the like.

Speech processing server 150 can be any networked server or speechprocessing system which is able to process speech requests using one ormore speech engines 156. In one embodiment, the speech processing server150 can be a turn-based and/or clustered system capable of handlingmultiple requests in real-time. For example, speech processing server150 can be implemented as a WEBSPHERE VOICE SERVER or other suchcommercially available product. Management tasks of the server 150 canbe handled by the management processor 152. The various speech engines156 can include ASR, TTS, SIV, voice markup interpreters, and the like.

Data stores 124, 138, 160, and 164 can be a physical or virtual storagespace configured to store digital information. Data stores 124, 138,160, and 164 can be physically implemented within any type of hardwareincluding, but not limited to, a magnetic disk, an optical disk, asemiconductor memory, a digitally encoded plastic memory, a holographicmemory, or any other recording medium. Each of the data stores 124, 138,160, and 164 can be a stand-alone storage unit as well as a storage unitformed from a plurality of physical devices. Additionally, informationcan be stored within data stores 124, 138, 160, and 164 in a variety ofmanners. For example, information can be stored within a databasestructure or can be stored within one or more files of a file storagesystem, where each file may or may not be indexed for informationsearching purposes. Further, data stores 124, 138, 160, and 164 canutilize one or more encryption mechanisms to protect stored informationfrom unauthorized access.

The components of system 100 can be communicatively linked to each othervia a network (not shown). The network can include anyhardware/software/and firmware necessary to convey data encoded withincarrier waves. Data can be contained within analog or digital signalsand conveyed though data or voice channels. The network can includelocal components and data pathways necessary for communications to beexchanged among computing device components and between integrateddevice components and peripheral devices. The network can also includenetwork equipment, such as routers, data lines, hubs, and intermediaryservers which together form a data network, such as the Internet. Thenetwork can also include circuit-based communication components andmobile communication components, such as telephony switches, modems,cellular communication towers, and the like. The network can includeline based and/or wireless communication pathways.

FIG. 2 is a schematic diagram of a system 200 for a Web 2.0 for voicesystem 230 in accordance with an embodiment of the inventivearrangements disclosed herein. System 200 can be an alternativerepresentation and/or an embodiment for the system 100 of FIG. 1 or fora system that provides approximately equivalent functionality as system100 utilizing Web 2.0 concepts to provide speech processingcapabilities.

In system 200, Web 2.0 clients 240 can communicate with Web 2.0 servers210-214 utilizing a REST/ATOM 250 protocol. The Web 2.0 servers 210-214can serve one or more speech-enabled applications 220-224, where speechresources are provided by a Web 2.0 for Voice system 230. One or more ofthe applications 220-224 can include AJAX 256 or other JavaScript code.In one embodiment, the AJAX 256 code can be automatically converted fromWIKI or other syntax by a transformer of a server 210-214.

Communications between the Web 2.0 servers 210-214 and system 230 can bein accordance with REST/ATOM 256 protocols. Each speech-enabledapplication 220-224 can be associated with an ATOM container 231, whichspecifies Web 2.0 items 232, resources 233, and media 234. One or moreresource 233 can correspond to a speech engine 238.

The Web 2.0 clients 240 can be any client capable of interfacing with aWeb 2.0 server 210-214. For example, the clients 240 can include a Webor voice browser 241 as well as any other type of interface 244, whichexecutes upon a computing device. The computing device can include amobile telephone 242, a mobile computer 243, a laptop, a media player, adesktop computer, a two-way radio, a line-based phone, and the like.Unlike conventional speech clients, the clients 240 need not have aspeech-specific interface and instead only require a standard Web 2.0interface. That is, there are no assumptions regarding the client 240other than an ability to communicate with a Web 2.0 server 210-214 usingWeb 2.0 conventions.

The Web 2.0 servers 210-214 can be any server that provides Web 2.0content to clients 240 and that provides speech processing capabilitiesthrough the Web 2.0 for voice system 230. The Web 2.0 servers caninclude a WIKI server 210, a BLOG server 212, a MASHUP server, aFOLKSONOMY server, a social networking server, and any other Web 2.0server 214.

The Web 2.0 for voice system 230 can utilize Web 2.0 concepts to providespeech capabilities. A server-side interface is established between thevoice system 230 and a set of Web 2.0 servers 210-214. Available speechresources can be introspected and discovered via introspectiondocuments, which are one of the Web 2.0 items 232. Introspection can bein accordance with the APP specification or a similar protocol. Theability for dynamic configuration and installation is exposed to theservers 210-214 via the introspection document.

That is, access to Web 2.0 for voice system 230 can be through a Web 2.0server that lets users (e.g., clients 240) provide their owncustomizations/personalizations. Appreciably, use of the APP 256 opensup the application interface to speech resources using Web 2.0, JAVA 2ENTERPRISE EDITION (J2EE), WEBSPHERE APPLICATION SERVER (WAS), and otherconventions, rather than being restricted to protocols, such as mediaresource control protocol (MRCP), real time streaming protocol (RTSP),or real time protocol (RTP).

A constrained set of RESTful commands can be used to interface with theWeb 2.0 for voice system 230. RESTful commands can include a GETcommand, a POST command, a PUT command, and a DELETE command, each ofwhich is able to be implemented as an HTTP command. As applied tospeech, GET (e.g., HTTP GET) can return capabilities and elements thatare modifiable. The GET command can also be used for submittingsimplistic speech queries and for receiving query results.

The POST command can create media-related resources using speech engines238. For example, the POST command can create an audio “file” from inputtext using a text-to-speech (TTS) resource 233 which is linked to a TTSengine 238. The POST command can create a text representation given anaudio input, using an automatic speech recognition (ASR) resource 233which is linked to an ASR engine 238. The POST command can create ascore given an audio input, using a Speaker Identification andVerification (SIV) resource which is linked to a SIV engine 238. Anytype of speech processing resource can be similarly accessed using thePOST command.

The PUT command can be used to update configuration of speech resources(e.g., default voice-name, ASR or TTS language, TTS voice, mediadestination, media delivery type, etc.) The PUT command can also be usedto add a resource or capability to a Web 2.0 server 210-214 (e.g.installing an SIV component). The DELETE command can remove a speechresource from a configuration. For example, the DELETE command can beused to uninstall a previously installed speech component.

The Web 2.0 for Voice system 230 is an extremely flexible solution thatpermits users (of clients 240) to customize numerous speech processingelements. Customizable speech processing elements can include speechresource availability, request characteristics, result characteristics,media characteristics, and the like. Speech resource availability canindicate whether a specific type of resource (e.g., ASR, TTS, SIV, VoiceXML interpreter) is available. Request characteristics can refer tocharacteristics such as language, grammar, voice attributes, gender,rate of speech, and the like. The result characteristics can specifywhether results are to be delivered synchronously or asynchronously.Result characteristics can alternatively indicate whether a listener forcallback is to be supplied with results. Media characteristics caninclude input and output characteristics, which can vary from a URIreference to an RTP stream. The media characteristics can specify acodec (e.g., G711), a sample rate (e.g., 8 KHz to 22 KHz), and the like.In one configuration, the speech engines 238 can be provided from a J2EEenvironment 236, such as a WAS environment. This environment 236 canconform to a J2EE Connector Architecture (JCA) 237.

In one embodiment, a set of additional facades 260 can be utilized ontop of Web 2.0 protocols to provide additional interface and protocol262 options (e.g., MRCP, RTSP, RTP, Session Initiation Protocol (SIP),etc.) to the Web 2.0 for voice system 230. Use of facades 260 can enablelegacy access/use of the Web 2.0 for voice system 230. The facades 260can be designed to segment the protocol 262 from underlying details sothat characteristics of the facade do not bleed through to speechimplementation details. Functions, such as the WAS 6.1 channel frameworkor a JCA container, can be used to plug-in a protocol, which is notnative to the J2EE environment 236. The media component 234 of thecontainer 231 can be used to handle media storage, delivery, and formatconversions as necessary. Facades 260 can be used for asynchronous orsynchronous protocols 262.

FIG. 3 is a schematic diagram showing a WIKI server 330 adapted forcommunications with a Web 2.0 for voice system 310 in accordance with anembodiment of the inventive arrangements disclosed herein. Although aWIKI server 330 is illustrated, server 330 can be any WEB 2.0 server(e.g., server 120 of system 100 or server 210-214 of system 200)including, but not limited to, a BLOG server, a MASHUP server, aFOLKSONOMY server, a social networking server, and the like.

In the system 300, a browser 320 can communicate with Web 2.0 server 330via Representational State Transfer (REST) architecture / ATOM 304 basedprotocol. The Web 2.0 server 330 can communicate with a speech for Web2.0 system 310 via a REST/ATOM 302 based protocol. Protocols 302, 304can include HTTP and similar protocols that are RESTful by nature aswell as an Atom Publishing Protocol (APP) or other protocol that isspecifically designed to conform to REST principles.

The Web 2.0 server 330 can include a data store 332 in whichapplications 334, which can be speech-enabled, are stored. In oneembodiment, the applications 332 can be written in a WIKI or other Web2.0 syntax and can be stored in an APP format.

The contents of the application 332 can be accessed and modified usingeditor 350. The editor 350 can be a standard WIKI or other Web 2.0editor having a voice plug-in or extensions 352. In one implementation,user-specific modifications made to the speech-enabled application 334via the editor 350 can be stored in customization data store as acustomization profile and/or a state definition. The customizationprofile and state definition can contain customization settings that canoverride entries contained within the original application 332.Customizations can be related to a particular user or set of users.

The transformer 340 can convert WIKI or other Web 2.0 syntax intostandard markup for browsers. In one embodiment, the transformer 340 canbe an extension of a conventional transformer that supports HTML andXML. The extended transformer 340 can be enhanced to handle JAVA SCRIPT,such as AJAX. For example, resource links of application 332 can beconverted into AJAX functions by the transformer 340 having an AJAXplug-in 342. The transformer 340 can also include a VoiceXML plug-in344, which generates VoiceXML markup for voice-only clients.

The present invention may be realized in hardware, software, or acombination of hardware and software. The present invention may berealized in a centralized fashion in one computer system or in adistributed fashion where different elements are spread across severalinterconnected computer systems. Any kind of computer system or otherapparatus adapted for carrying out the methods described herein issuited. A typical combination of hardware and software may be a generalpurpose computer system with a computer program that, when being loadedand executed, controls the computer system such that it carries out themethods described herein.

The present invention also may be embedded in a computer programproduct, which comprises all the features enabling the implementation ofthe methods described herein, and which when loaded in a computer systemis able to carry out these methods. Computer program in the presentcontext means any expression, in any language, code or notation, of aset of instructions intended to cause a system having an informationprocessing capability to perform a particular function either directlyor after either both of the following: a) conversion to anotherlanguage, code or notation; b) reproduction in a different materialform.

This invention may be embodied in other forms without departing from thespirit or essential attributes thereof. Accordingly, reference should bemade to the following claims, rather than to the foregoingspecification, as indicating the scope of the invention.

1. A speech processing system comprising: a client configured to access a speech-enabled application using at least one Web 2.0 communication protocol; a speech for Web 2.0 system within which the speech-enabled application executes, said speech for Web 2.0 system accessing a data store within which user specific speech parameters are included, wherein a user of the client is able to configure the specific speech parameters of the data store associated with the user, and wherein the speech-enabled application executes in accordance with the specific speech parameters corresponding to the user of the client; and a speech processing system comprising a plurality of speech processing engines, wherein the speech processing system interacts with the speech for Web 2.0 system to handle speech processing tasks associated with the speech-enabled application.
 2. The system of claim 1, wherein the specific speech parameters specify at least one of speech resource availability, speech resource characteristics, and speech delivery characteristics.
 3. The system of claim 1, wherein the Web 2.0 communication protocol is a Hypertext Transfer Protocol (HTTP) based protocol, and wherein the speech processing system interfaces with the speech for Web 2.0 system using an Atom Publication Protocol (APP) based protocol.
 4. The system of claim 1, wherein interactions between the speech processing system and the speech for Web 2.0 system occur through one of four RESTful commands, said RESTful commands comprising a GET command, a POST command, a PUT command, and a DELETE command.
 5. The system of claim 1, wherein said speech-enabled application comprises at least one introspection document, which is used to enable the client to configure the specific speech parameters.
 6. The system of claim 1, wherein the speech enabled application comprises two collections, one of these collections comprising at least one entry, each entry defining content that is presented to the client, the other one of the collections comprising a collection of resources that include speech processing resources, wherein a one-to-one relationship exists between the speech processing resources of the collection of resources and a type of speech comprising engine of the speech processing system to which the speech processing resource corresponds, said types of speech processing engines including at least two of a recognition engine, a text-to-speech engine, a speech identification and verification (SIV) engine, and a VoiceXML interpreter.
 7. The system of claim 1, wherein the speech-enabled application is at least one of a WIKI, a BLOG, a MASHUP, a social networking application, and a FOLKSONOMY.
 8. The system of claim 1, wherein the client comprises a standard Web browser through which the client interfaces with the speech for Web 2.0 system, wherein the Web 2.0 communication protocol is directly supported by the standard Web browser.
 9. The system of claim 1, further comprising: a middleware server comprising a standard voice browser, wherein said client interacts with the middleware server over a real-time voice communication channel, wherein the standard voice browser interfaces with the speech for Web 2.0 system, wherein the Web 2.0 communication protocol is directly supported by the standard voice browser.
 10. The system of claim 1, further comprising: an enterprise server comprising enterprise content, wherein the enterprise server interacts with the speech for Web 2.0 system to permit the client to access the enterprise content by interacting with the speech-enabled application.
 11. A system for using Web 2.0 as an interface to speech engines comprising: a Web 2.0 server configured to serve at least one speech-enabled application to at least one remotely located client; and a server-side speech processing system configured to handle speech processing operations for the at least one speech-enabled application, wherein communications with the server-side speech processing system occur via a set of RESTful commands.
 12. The system of claim 11, wherein Web 2.0 server utilizes at least one introspection document associated with the speech-enabled application for introspection and discovery of speech resources and to configure the speech resources.
 13. The system of claim 12, wherein the introspection document and the RESTful commands conform to an Atom Publication Protocol (APP) based specification.
 14. The system of claim 11, wherein the set of RESTful commands comprise an HTTP GET command, an HTTP POST command, an HTTP PUT command, and an HTTP DELETE command.
 15. The system of claim 14, wherein said GET command selectively returns modifiable speech processing capabilities and elements, said GET command also selectively returning speech query results, wherein said POST command selectively provides input to a speech engine and returning output from the speech engine, said output being a processed result of the input, wherein said PUT command selectively updates speech resources for a configuration, said PUT command also selectively installing a speech resource for a configuration, and wherein said DELETE command selectively removes a speech resource from a configuration.
 16. The system of claim 11, wherein the set of RESTful commands consist of an HTTP GET command, an HTTP POST command, an HTTP PUT command, and an HTTP DELETE command.
 17. A speech for Web 2.0 system comprising: a Web 2.0 server configured to serve at least one speech-enabled application to remotely located clients, said speech-enabled application comprising an introspection document, a collection of entries, and a collection of resources, wherein at least one of the resources is a speech resource associated with a speech engine, which adds a speech processing capability to the speech-enabled application.
 18. The system of claim 17, wherein the speech-enabled application conforms to an Atom Publication Protocol (APP) based specification.
 19. The system of claim 17, wherein the speech engine is a turn-based speech processing engine executing within a JAVA 2 ENTERPRISE EDITION (J2EE) middleware environment.
 20. The system of claim 17, wherein the Web 2.0 server is configured so that end-users are able to introspect, customize, replace, add, re-order, and remove entries and resources in the collections. 